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[57] ABSTRACT 

Voice activation of functions on a network such as the 
Internet are accomplished using a speech recognition system 
running synchronously with standard desktop-based Internet 
functions. This synchronous operation allows voice-based 
control to be exercised for all operations on the Internet, 
System functions are based on a unique combination of a 
local web browser, a remotely-located speech/web server, 
and control links between a web browser and a speech/web 
server. The control links provide a mechanism for control- 
ling a speech server from a web page and a mechanism for 
driving both the local, as well as a remote, web browser. 

19 Claims, 7 Drawing Sheets 
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USING SPEECH RECOGNITION TO ACCESS at J****, Jj- J^S^ST.S 

THE INTERNET, INCLUDING ACCESS VIA A creator a^ ^ ^ ^ ^ ^ ^ 

TELEPHONE acl i va ted by a single command. For a voice macro, the 

Trn P OP THE INVENTION 5 speech server's recognition of an inputted voice command 

TITLE OF THE INvfcN l iuin ^ q{ commands 

Method and System for Using Speech Recognition to ^ prk)r art met hods for speech-enabling the Internet 
Access the Internet, including Access Via a Telephone. faave been explored b y various companies and research 

entities. In general terms, researchers have approached the 
FIELD OF THE INVENTION 1Q blem from eith er the perspective of speech-enabling the 

Thepresentinventionrelatestotheneldofcomputerized Internet, or from the perspective of Internet-enabling the 

Tefo C Lt Z^^otX^ tel ^rXL is the most common approach and the 

Sl A^ ltLe. control of telephony oae bei „g pursued by Texas Instruments, Apple Computer 

Z^S^SSL^P^T^^»^^ A » and Microsoft. In this approach, the speech recognition 

hrou C h ^?aue^ombiS of speech server, web engine is located on the local host along with the web 

through a unique ^momation p ^ oach guch ^ ^ lhose 

ss^^2X-?ys — r • r rr; oice ™ c z^° ws ctlons 

oace and a mechanism for driving both the local, as weU as can be used when browsmg the Internet, 

a rfmo°e web browser. 20 Texas Instruments further refined this approach by using 

a remote, web browser ^ ^ associated with hotlinks to supply the vocabularies 

BACKGROUND OF THE INVENTION fof lhe recognize r. Apple has taken the approach of making 

Tne.nternet.sessentia.lyanetworkofserve.contaimng ^^j^^^^Z^ 

information mat users can obtain using persona computer. ^ stable (^rollabte vnto ^ W P^ for ^ 

Users generally connect to a server, a computer equipped allo w them to speech-enable their web 

with information and capabilities.^ J J^gj pjde a mechanism for supplying the 

tt^^SSS^SS?" -h SrLmars and their speech synthesizers 

r«XSr on the internet, using , t mo u* an „ ^^Sfl-P^^^^^ 
Windows-based software. The user's nayiga ion of the Inter- The advantage , ^o tne p 

with functions activated using a mouse. nfno Edition a^ requirements of the user's computer, such 

Speech recognition software and hardware for use in <? ) nc ' addmonal req ^ ^ system 

conjunction with personal computers and o ther 35 * ^ "P^^J starting with an immediate utility with 
environments, like the Internet, is a rapidly developing u ^ a m ^ U °° f lations; an | ( 4) direct benefits are available 
technology. With speech recognition, a « s v» -- ^on ^egration. 

mands are recognized by a computer and then ^verted trom t P y S being 
based on the speech pattern, into an electronic signal For ^a re^arch effort Demonstrations from MIT 

example, speech recognition has been highly successfu n „ invesbga |d as a ^aroh effort D ^ ^ 

the field of longdistance telephone callmg for the purpose and the Sun Sp««iA«s gro p ^ jnfot 

of allowing collect calls. Typically, with this application, a using a speech-only I for using the 

Tnei: £ ~h recognition > selecting f ^^^#££. S2C£ 

speaker dependent and independent dictation machines, SUMMARY OF THE INVENTION 

continuous speech systems, large vocabulary systems, and nelworks such as the Internet and the 

small vocabulary systems. Further these systems cm be m^ontata »<™^~£ on ^ which 

Windows based, Macintosh based, UNIX J based, Windows WridWA ™ b '° %* P J em t0 pr0 v id e for speech access 

NT based, or based on another platform, depending on the 60 J^£*^ffi Jephone lines and control 

preferred operating system. teleohonv functions through standard web pages. These 

Speech recognition operating in conjunction with com- °'^™ y lishe 6 d through. a combination of 

pute'r connection with the Internet, also known as speech found in Interactive Voice 

enabling of the Internet, appears to have promising apph- ^ h *^S calions) , web browser, and control 

cation possibilities. One possible application of this tech- 65 Response OVJMg* J rf Mftwue lhat provides a 
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u ■ t a h«th thi-lnrAl asweUas follows, and in part will become more apparent to those 
page, and a mechanism for dnvmg both the local, as well ^ ^ ^ P^ examination of , he foUowing or may 

to q£. ^^^Vg^cally'browsing the Internet to a web s moans of the instrumcntahUes and combinations particularly 

page that continually carries the quotes. Once at the web pointed out in the appended claims, 

page the user can activate the present invention, telling the To acn i e ve the stated and other objects of the present 

speech server to, for example, "mark this" or "show me the inve ntion, as embodied and described below, the invention 

stock quote". The server can then be set to either tell the user may oam p X i a6 t he steps of: 

the stock price or go to that web page upon recognizing ot 10 accessing a voice recogn ition server through a voice 

he selected speech pattern. . . . ... . transmission device; 

operating over the telephone network to any web browser signals; and 

operating over the Internet. This link enables the user's web J5 using said translated voice transmissions to perform func- 

browser to be controlled by the remote speech recognition tions on the Internet via voice translation being per- 

device, and, in turn, enables telephony functions to be formed by said server. 

controlled by any web browser. In addition to providing an DESCRIPTION OF THE DRAWINGS 

immediate solution to accessing the web by voice, the DboCKlr iiujn ur 

invention provides tools and motivation for web page 2Q a block diagram of the invention is shown in FIG. 1. 

authors to generate web pages that are tailored to speech- ^ ^ ^ & ^ (hat happens across lhe we b page 

only interfaces. This is expected to transform the nature ot containi connec tion information on the present invention 

the web, and, over time, to support a truly multi-modal .^.^ ^ Qf s ^ ch eQabling his or her we b 

'-terface with the Internet. browser using the preferred embodiment. 

Tlie significance of the invention is that it provides both 25 illustrates the exchange of information necessary 

a means for immediately speech-enabhng the Internet and a • browser, 

means for araduallv Internet -enabling the telephone system. to speech enable a weo orowscr 

SthTr ^S^ m«^ the problem of linking FIG. 4 shows the connections in place for operation of the 

speech technology and the Internet from either one perspec- preferred embodiment, 

live or the other (that is, speech-enabling the net or net- 30 F IG. 5 illustrates all of the components of the system in 

enabling the telephone). The approach of the present operation. 

invention, however, can be viewed from either perspective, FIG 6 Contains an alternative embodiment, in which the 

and, in so doing, leads to an immediate speech -enabling of local web orowser is a slave to the speech/web server, 

the Internet, and to a process of Internet-enabling the tele- RG ? a seC ond alternative embodiment, in 

phone. In addition, the present invention leads to function- 35 which lhe spee ch/web server is a slave to the local web 

ality completely unobtainable from either of the other browser 
approaches taken alone. 

Iht control of both the server's web browser and the DETAILED DESCRIPTION OF THE 

user's remote web browser also enables an optional GUI for PREFERRED EMBODIMENT 

the user of the Speech/web server. The GUI link is not 40 ^ ^ dfawin the pre ferred embodiment of the 

required for the system to operate; however, because the web wiU now be explaiaed . 

is currently graphically-oriented, the ability to use the local P . g ^ fa mQ L A 

web browser as a GUI for the speech-driven browser is ™^££L 1 such as Netscape on a PC, is used to 

expectedtobebeneficialwhensu^ngthewebbyvoic^Tli ^« 

concept of a telephony-based web browser with an optional 45 °^™™^ 3 T h e local web browser 1 contains an 

GUI constitutes a significant attribute of the system because Protocol (TCP) link i n 

it provides a common platform that can be used to ^simple ^J^^^jM link 5 with an ASTP con- 
applications by anyone with a telephone In addition it can wmed comm y browser ? q£ fl fa/ 
be used for more difficult tasks when a PC or workstation is ^ 8^2 a Pentium processor-based PC running 
available to the user. 50 pc ^ h QT a ^p^e P c coupled 
Another example of the use of the present mvenUon wmo < ^ ^ g ^ ^ ^ ^ % M 
pertains to speech input and output over telephone lines as \ cd 10 l0 the ASTP controller 6. These couples can 
the additional modality that can be linked to the conven- ^ rf ^ ^ nnections as an electronic circuit, a fiber 
tional web browser interface. Thus, rather than p lacing a ^ electromagnelic signal> or any other means of 
call, hanging up, and placing another call, a user will be able 55 f ^ ^ art ADialogic line card located in the 
to browse using the telephone. This browsing includes such J? > ^ ^ ^ fa 9 pc u the gpeech 
activities as seamlessly speaking to one person, and men f ^ & telephone network 12. The speech/web browser 
connecting to another and then checking messages and ^ ^ ^ ^ 2 

browsers that understand and speak other languages, or even 6. _ . . , ■ ftff th _ chplf weh 

Agonal objec-s, advantages and novel features of the described below ^^^^"0 "run 
invention will be set forth in part in the description which language, such as JAVA, lhat allows program 
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within a web browser, sucb as Netscape. However, since the the local web browser 16 the local telephone number and 
speech/web browser 7 is driven by speech-only, it is always downloads 21 the ASTP plug-in from the speech/web server 
run in text-only mode. This gives it a considerable response 18. In FIG. 4, the user 15 of the local web browser 16 and 
time advantage over a browser that must download and ] oca i telephone 19 simultaneously connects by ASTP con- 
display graphics. The time normally devoted to graphics can 5 nect j on yj an( j by telephone connection 22 with the speech/ 
thus be used by the recognizer (speech server 4) to compile web xr/t{ lg 
the grammar for the new web page. 

The speech server 9 is typical of those used for IVR and The setup of the preferred embodiment is now completed, 

o^LT^r^ZTkcs, systems vary consider- as shown in FIG. 5. The user 15 of the local web browser 6 
ably in the number of simultaneous channels of speech " and local telephone 19 simultaneously communicates with 

recognition they can support, but are most often built from the speech/web server 18 via ASTP connection 17 and 

off-the-shelf components that plug into a PC (AT bus). A telephone connection 22. The user 15 is also connected by 

typical configuration for a speech server would be a Pentium a TCP link 25 to other web servers 24 simultaneously 26 

class PC running UNIX or Windows NT, loaded with a witn the sp eech/web server 18 connection by a TCP link 23 

speech recognizer such as ALTech, PureSpeech, or Nuance, » ^ Qlher web servers 2 4. 

with a Dialogic line card capable of handling multiple cimnhan,™.; links 26 the user can 

simultaneous Telephone lines, and two speech recognition As a result of these simultaneous links ,26 he user can 

boards, each with four channels of recognition. Speech browse the Internet using voice while looking a the screen 

output is either from pre-recorded prompts or a speech of the local web browser 16 and speaking over the phone 1* 

synthesizer. The telephone line card enables the system to 20 Typically these links allow a user to speak into the phone 

dial out, receive calls, and to conference calls. using words within the system's capability. These words are 

The ASTP software 4 and 6 is the heart of the system. As recognized and interpreted by the sp^h/web ^browser 

noted, this software is written and distributed as a plug-in located at the speech/web server 18 and translated into a 

module to Netscape or other browsers and is written in a TCP link 23 command for the speech/web browser at the 

typical software that can operate in Netscape, such as JAVA 25 speech/web server 18. At the same time, the ASTP supplies 

The protocol is a superset of the Common Client Interface the same TCP link command 17 on the local web browser 

(CC1), which provides the mechanism for establishing a ig. Thus, the user 15 speaks to control browsing of the 

persistent link between the speech/web browser 7 and the Internet. 

user's browser (local web browser 1). The persistent link dgni&ciIil advantage of the preferred embodiment is 

enables the speech/web browser 7 to remotely control the g P for ^ 

user's web browser 1 the user's web browser 1 to control responsivcncaa. 

^iSSb^r 7, and also allows the two browsers speech/web server to generate grammars while the user s 

{IZT^L the web in tandem. browser is busy displaying graphxcs. A seco ^-V*^ 

In addition to the CCI-like capability, the ASTP protocols „ * that neither of the web browsers need to be modified for 

provide the interface to the speech server 9, telling the the system to work, 

recognizer what grammar to compile for the next web page. Variation and Modifications 

This function is typically fulfilled by simply stripping the j w0 variations on the invention are illustrated in FIGS. 6 

text associated with each hotlink and sending it to the ^ ? approaches differ from the one described in 

recognizer's grammar compiler. Alternatively, versions of 4Q piG j ^ that they requ i re on ] y a single link into the Internet, 

the protocol support calls to high-level routines, called ^ ^ ^ lmks described prev iously. 

"speech behaviors", that handle all of the dialog between the browser 1 

user and the machine. These high-level routines allow users In the method shown in FIG. J- ^ ^^"^ 

to supply, by voice, specific kinds of information when using with ASTP plug-in 4 is ; linked M> to an AST? j™tnfl« 6 

the Internet, such as' credit card numbers, addresses, and 45 located within a speech/web browser 7 housed with ma 

telephone numbers. By providing web page authors with Pentium processor PC-based speech/web server 8. This PC 

access to well-designed dialog modules that can be easily is typically running Windows. This PC also hosts, or a 

deployed through simple-to-use web authoring tools, such as separate PC coupled to the speech/web server 8 hosts, the 

the ASTP protocols, the predominately graphical nature of speech server 9, which is coupled 10 to the ASTP controller 

the web changes to accommodate a speech-only, telephone- 5Q 6 Tne spe ech server 9 is linked 11 to a telephone network 

based interface. 12. The speech/web browser 7 is also TCP linked 13 to the 

Finally, the ASTP link 5 is what provides the conduit Internet 2. 

between the web page and the telephone. This allows web The pr i marv difference between this alternative and the 

authors to include telephone numbers associated with hot- eQrlier embod i raent (FIG. 1) is that a direct link 13 does not 

links that can be dialed by the speech/web server 8. This 55 between tne spee ch/web browser 6 and the Internet 2 

capability may change how switching is currently done in simultaneous with a link between the local web browser 1 

the telephone network 12. and the Inlerne t 2 (link 3 of FIG. 1). 

FIG. 2 shows how a user that happens across the web page ^ ^ ^ mQ ?> ^ ^ web brQWSer x 

containing connection information on the present ^vention 5 to an controller 6 

initiates the process of speech enabling his or her we 60 ^ P * h/web browser 7 housed within a 

browser using the Sum processor -based PC speech/web server 8. This PC 

local web browser 16 initiates a TCP connection 17 witn tne ™» v , .. _„ rh/ule u 

speech/web site, which is served by the speech/web server also hosts, or a separate PC coupled to the <P~°^"™« 

18 by selecting a hotlink such as "surf the web by voice" at 8 hosts, the speech server 9, which is coupled 10 to the ASTP 

the web site 65 controller 6. The speech server 9 is linked 11 to a telephone 

In FIG. * user 15 of a local web browser 16 and local network 12. The local web browser 1 is also TCP linked 3 

telephone 19 uploads 20 to the speech/web server 18 from to the Internet 2. 
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The primary difference between this alternative and the 
earlier embodiment (FIG. 1) is that a direct link does not 
exist between the speech/web browser 6 and the Internet 2 
(link 13 of FIG. 1) simultaneous with a link 3 between the 
local web browser 1 and the Internet 2. 

What is claimed is: 

1. A remote server to enable a local user to increase the 
functionality of a local browser having a graphical user 
interface, comprising: 

a remote web browser residing on the remote server; 

a speech controller electronically coupled to said remote 
web browser, said controller being configured to form 
control links coupling the local browser to said remote 
browser via an Internet data communication link to 
enable said remote web browser and the local browser 
to function cooperatively; and 

a speech server having a speech recognition function 
residing on the remote server, said speech server cou- 
pling said controller to a telephone network so that a 
telephonic voice communication link may be estab- 
lished between the user and said controller; 

wherein voice commands to control browsing may be 
input via said telephonic voice communication link and 
wherein graphical user interface commands to control 1S 
browsing may also be input via the local browser. 

2. The server of claim 1, wherein said controller and said 
server are configured to form said telephonic voice commu- 
nication link in response to the user accessing a web site via 
said Internet data communication link. 

3. The 'server of claim 1 wherein said control links are 
configured to enable the local browser to control the tele- 
phonic function of said speech server. 

4. The server of claim 1, wherein said controller is a 
software module contained in said remote browser. 

5. The server of claim 4, wherein said controller is 
configured to download a software program to the local 
browser to form persistent control links. 

6. A remote server to enable a local user to increase the 
functionality of a local browser, comprising: 

a remote web browser residing on the remote server; 

a speech controller electronically coupled to said remote 
web browser, said controller being configured to form 
control links coupling the local browser to said remote 
browser via an Internet data communication link to 
enable said remote web browser and the local browser 
to function cooperatively; and 

a speech server having a speech recognition function 
residing on the remote server, said speech server cou- 
pling said controller to a telephone network so that a 
voice communication link may be established between 
the user and said controller; 

wherein said control links are configured to enable voice 
commands to be uploaded to control the browsing 55 
function while information from the Internet is down- 
loaded to a graphical user interface of the local browser. 

7. The server of claim 6, wherein said control links are 
configured so that the user may browse by both voice 
commands and by inputting commands via said graphical 
user interface. 

8. A network system, comprising: 

a) a local browser disposed on a local computer; and 

b) a remote server including: 

i) a remote browser residing on said remote server; 

ii) a speech controller software module electronically 
coupled to said remote browser; and 
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iii) a speech server having a speech recognition func- 
tion residing on the remote server, said speech server 
coupling said speech controller software module to a 
telephone network so that a voice communication 
link may be established between the user and said 
speech controller software module; 
said controller software module having an interface pro- 
tocol for remotely controlling web browsers configured 
to form control links coupling said local browser to said 
remote browser via a network data link to enable said 
remote web browser and said local browser to function 
cooperatively, wherein said control links are configured 
so that auxiliary voice commands may be input by the 
user to control browsing of the network. 

9. The system of claim 8, wherein said local browser 
includes a graphical user interface and said control links are 
configured so that the user may browse by both voice 
commands and by inputting commands via said graphical 
user interface. 

10. A network system, comprising: 

a) a local browser disposed on a local computer; and 

b) a remote server including: 

i) a remote browser residing on said remote server; 

ii) a speech controller software module electronically 
coupled to said remote browser, said controller soft- 
ware module being configured to form control links 
coupling said local browser to said remote browser 
via a network data link to enable said remote web 
browser and said local browser to function coopera- 
tively; and 

iii) a speech server having a speech recognition func- 
tion residing on the remote server, said speech server 
coupling said speech controller software module to a 
telephone network so that a voice communication 
link may be established between the user and said 
speech controller software module; 

wherein said control links are configured to enable voice 
commands to be uploaded to control the browsing 
function while information from the network is down- 
loaded to the graphical user interface of said local 
browser. 

11. The system of claim 10 wherein said controller 
software module includes an interface protocol for remotely 
controlling a web browser. 

12. A method for permitting a local user to link a local 
web browser to a remote speech recognition device, com- 
prising the steps of: 

a) electronically coupling the local browser to a web-site 
served by a remote server; 

b) downloading a software program from a remote web 
browser residing on said remote server to form control 
links between the local web browser and a controller 
coupled to said remote web browser; and 

c) telephoning the user to form a voice communication 
link between the user and said controller via a speech 
server coupling said controller to a telephone network; 

whereby the user may input voice commands which are 
translated by said speech server to control browsing of 
a computer network while information from the net- 
work is downloaded to a graphical user interface of the 
local browser. 

13. The method of claim 12, further comprising after step 
"b" the step of: uploading the phone number of the local 

user. , „ 

14. The method of claim 12, wherein said controller 
software module is contained in said remote web browser. 
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15. A method for permitting a local user to use voice 
commands to perform functions on a network, comprising 
the steps of: 

a) providing a remote server, the remote server having a 
controller for forming a first data communication link 5 
with a local user and a speech server for converting 
voice commands into control signals; 

b) accessing said remote server to form a first electronic 
communication link to a local browser; 

c) telephoning the user to form a voice transmission 
communication link coupling the user to the controller 
via said speech server; 

d) translating voice commands into electronic data signals 
using said speech server; and 15 

e) using said translated voice commands to perform 
functions on the network; 

wherein said controller is configured to enable voice 
commands to be uploaded to control the browsing 
function while information from the network is down- 20 
loaded to a graphical user interface of the local browser. 



16. The method of claim 15, wherein the network com- 
prises the Internet and further wherein the controller is 
contained in a remote browser residing on said remote 
server. 

17. The method of claim 16, wherein said speech server 
is coupled to a telephone network and further comprising 
after step "b" the step of: 

uploading a local telephone number. 

18. The method of claim 15, further comprising the step 

of: 

downloading a software program to said local browser to 
enable a persistent link to be formed between the 
controller and the local browser. 

19. The method of claim 15, wherein said speech server 
is coupled to a telephone network and further comprising the 
step of: 

accessing a hot-linked phone number on a web-site to 
initiate dialing of said phone number by said speech 



